System and method for generating 3d objects from 2d images of garments

ABSTRACT

A system for generating three-dimensional (3D) objects from two-dimensional (2D) images of garments is presented. The system includes a data module configured to receive a 2D image of a selected garment and a target 3D model. The system further includes a computer vision model configured to generate a UV map of the 2D image of the selected garment. The system moreover includes a training module configured to train the computer vision model based on a plurality of 2D training images and a plurality of ground truth (GT) panels for a plurality of 3D training models. The system furthermore includes a 3D object generator configured to generate a 3D object corresponding to the selected garment based on the UV map generated by a trained computer vision model and the target 3D model. A related method is also presented.

PRIORITY STATEMENT

The present application claims priority under 35 U.S.C. § 119 to Indianpatent application number 202141037135 filed Aug. 16, 2021, the entirecontents of which are hereby incorporated herein by reference.

BACKGROUND

Embodiments of the present invention generally relate to systems andmethods for generating 3D objects from 2D images of garments, and moreparticularly to systems and methods for generating 3D objects from 2Dimages of garments using a trained computer vision model.

Online shopping (e-commerce) platforms for fashion items, supported in acontemporary Internet environment, are well known. Shopping for clothingitems online via the Internet is growing in popularity because itpotentially offers shoppers a broader range of choices of clothing incomparison to earlier off-line boutiques and superstores.

Typically, most fashion e-commerce platforms show catalog images withhuman models wearing the clothing items. The models are shot in variousposes and the images are cataloged on the e-commerce platforms. However,the images are usually presented in a 2D format and thus lack thefunctionality of a 3D catalog. Moreover, shoppers on e-commerceplatforms may want to try out different clothing items on them in a 3Dformat before making an actual online purchase of the item. This willgive them the experience of “virtual try-on”, which is not easilyavailable on most e-commerce shopping platforms.

However, the creation of a high-resolution 3D object for a clothing itemmay require expensive hardware (e.g., human-sized style-cubes, etc.) aswell as costly setups in a studio. Further, it may be challenging torender 3D objects for clothing with high-resolution texture.Furthermore, conventional rendering of 3D objects may be time-consumingand not amenable to efficient cataloging in an e-commerce environment.

Thus, there is a need for systems and methods that enable faster andcost-effective 3D rendering of clothing items with high-resolutiontexture. Further, there is a need for systems and methods that enablethe shoppers to virtually try on the clothing items in a 3D setup.

SUMMARY

The following summary is illustrative only and is not intended to be inany way limiting. In addition to the illustrative aspects, exampleembodiments, and features described, further aspects, exampleembodiments, and features will become apparent by reference to thedrawings and the following detailed description.

Briefly, according to an example embodiment, a system for generatingthree-dimensional (3D) objects from two-dimensional (2D) images ofgarments is presented. The system includes a data module configured toreceive a 2D image of a selected garment and a target 3D model. Thesystem further includes a computer vision model configured to generate aUV map of the 2D image of the selected garment. The system moreoverincludes a training module configured to train the computer vision modelbased on a plurality of 2D training images and a plurality of groundtruth (GT) panels for a plurality of 3D training models. The systemfurthermore includes a 3D object generator configured to generate a 3Dobject corresponding to the selected garment based on the UV mapgenerated by a trained computer vision model and the target 3D model.

According to another example embodiment, a system configured tovirtually fit garments on consumers by generating three-dimensional (3D)objects from two-dimensional (2D) images of garments is presented. Thesystem includes a 3D consumer model generator configured to generate a3D consumer model based on one or more information provided by aconsumer. The system further includes a data module configured toreceive a 2D image of a selected garment and the 3D consumer model. Thesystem furthermore includes a computer vision model configured togenerate a 2D map of the 2D image of the selected garment, and atraining module configured to train the computer vision model based on aplurality of 2D training images and a plurality of ground truth (GT)panels for a plurality of 3D training models. The system moreoverincludes a 3D object generator configured to generate a 3D objectcorresponding to the selected garment based on the UV map generated by atrained computer vision model and the 3D consumer model, wherein the 3Dobject is the 3D consumer model wearing the selected garment.

According to another example embodiment, a method for generatingthree-dimensional (3D) objects from two-dimensional (2D) images ofgarments is presented. The method includes receiving a 2D image of aselected garment and a target 3D model. The method further includestraining a computer vision model based on a plurality of 2D trainingimages and a plurality of ground truth panels for a plurality of 3Dtraining models. The method furthermore includes generating a UV map ofthe 2D image of the selected garment based on the trained computervision model, and generating a 3D object corresponding to the selectedgarment based on the UV map generated by a trained computer vision modeland the target 3D model.

BRIEF DESCRIPTION OF THE FIGURES

These and other features, aspects, and advantages of the exampleembodiments will become better understood when the following detaileddescription is read with reference to the accompanying drawings in whichlike characters represent like parts throughout the drawings, wherein:

FIG. 1 is a block diagram illustrating an example system for generating3D objects from 2D images of garments, according to some aspects of thepresent description,

FIG. 2 is a block diagram illustrating an example computer vision model,according to some aspects of the present description,

FIG. 3 illustrates an example workflow of a computer vision model,according to some aspects of the present description,

FIG. 4 illustrates example landmark prediction by a landmark andsegmental parsing network in 2D images, according to some aspects of thepresent description,

FIG. 5 illustrates example segmentations by a landmark and segmentalparsing network in 2D images, according to some aspects of the presentdescription,

FIG. 6 illustrates an example workflow for a texture mapping network,according to some aspects of the present description,

FIG. 7 illustrates an example workflow for an inpainting network,according to some aspects of the present description,

FIG. 8 illustrates an example workflow for identifying 3D poses by a 3Dtraining model generator, according to some aspects of the presentdescription,

FIG. 9 illustrates an example for draping garment panels on a 3Dtraining model by a 3D training model generator, according to someaspects of the present description,

FIG. 10 illustrates an example workflow for generating training data bya training data generator, according to some aspects of the presentdescription,

FIG. 11 illustrates an example workflow for generating a 3D object froma 2D image using a UV map, according to some aspects of the presentdescription,

FIG. 12 illustrates a flow chart for generating a 3D object from a 2Dimage using a UV map, according to some aspects of the presentdescription,

FIG. 13 illustrates a flow chart for generating training data, accordingto some aspects of the present description,

FIG. 14 illustrates a flow chart for generating a UV map from a computervision model, according to some aspects of the present description,

FIG. 15 illustrates a flow chart for generating a UV map from a computervision model, according to some aspects of the present description, and

FIG. 16 is a block diagram illustrating an example computer system,according to some aspects of the present description.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

Various example embodiments will now be described more fully withreference to the accompanying drawings in which only some exampleembodiments are shown. Specific structural and functional detailsdisclosed herein are merely representative for purposes of describingexample embodiments. Example embodiments, however, may be embodied inmany alternate forms and should not be construed as limited to only theexample embodiments set forth herein. On the contrary, exampleembodiments are to cover all modifications, equivalents, andalternatives thereof.

The drawings are to be regarded as being schematic representations andelements illustrated in the drawings are not necessarily shown to scale.Rather, the various elements are represented such that their functionand general purpose become apparent to a person skilled in the art. Anyconnection or coupling between functional blocks, devices, components,or other physical or functional units shown in the drawings or describedherein may also be implemented by an indirect connection or coupling. Acoupling between components may also be established over a wirelessconnection. Functional blocks may be implemented in hardware, firmware,software, or a combination thereof.

Before discussing example embodiments in more detail, it is noted thatsome example embodiments are described as processes or methods depictedas flowcharts. Although the flowcharts describe the operations assequential processes, many of the operations may be performed inparallel, concurrently or simultaneously. In addition, the order ofoperations may be re-arranged. The processes may be terminated whentheir operations are completed, but may also have additional steps notincluded in the figures. It should also be noted that in somealternative implementations, the functions/acts/steps noted may occurout of the order noted in the figures. For example, two figures shown insuccession may, in fact, be executed substantially concurrently or maysometimes be executed in the reverse order, depending upon thefunctionality/acts involved.

Further, although the terms first, second, etc. may be used herein todescribe various elements, components, regions, layers and/or sections,it should be understood that these elements, components, regions, layersand/or sections should not be limited by these terms. These terms areused only to distinguish one element, component, region, layer, orsection from another region, layer, or a section. Thus, a first element,component, region, layer, or section discussed below could be termed asecond element, component, region, layer, or section without departingfrom the scope of example embodiments.

Spatial and functional relationships between elements (for example,between modules) are described using various terms, including“connected,” “engaged,” “interfaced,” and “coupled.” Unless explicitlydescribed as being “direct,” when a relationship between first andsecond elements is described in the description below, that relationshipencompasses a direct relationship where no other intervening elementsare present between the first and second elements, and also an indirectrelationship where one or more intervening elements are present (eitherspatially or functionally) between the first and second elements. Incontrast, when an element is referred to as being “directly” connected,engaged, interfaced, or coupled to another element, there are nointervening elements present. Other words used to describe therelationship between elements should be interpreted in a like fashion(e.g., “between,” versus “directly between,” “adjacent,” versus“directly adjacent,” etc.).

The terminology used herein is for the purpose of describing particularexample embodiments only and is not intended to be limiting. Unlessotherwise defined, all terms (including technical and scientific terms)used herein have the same meaning as commonly understood by one ofordinary skill in the art to which example embodiments belong. It willbe further understood that terms, e.g., those defined in commonly useddictionaries, should be interpreted as having a meaning that isconsistent with their meaning in the context of the relevant art andwill not be interpreted in an idealized or overly formal sense unlessexpressly so defined herein.

As used herein, the singular forms “a,” “an,” and “the,” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. As used herein, the terms “and/or” and “at least one of”include any and all combinations of one or more of the associated listeditems. It will be further understood that the terms “comprises,”“comprising,” “includes,” and/or “including,” when used herein, specifythe presence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

Unless specifically stated otherwise, or as is apparent from thedescription, terms such as “processing” or “computing” or “calculating”or “determining” of “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computingdevice/hardware, that manipulates and transforms data represented asphysical, electronic quantities within the computer system's registersand memories into other data similarly represented as physicalquantities within the computer system memories or registers or othersuch information storage, transmission or display devices.

Example embodiments of the present description provide systems andmethods for generating 3D objects from 2D images of garments using atrained computer vision model. Some embodiments of the presentdescription provide systems and methods to virtually fit garments onconsumers by generating 3D objects including 3D consumer models wearinga selected garment.

FIG. 1 illustrates an example system 100 for generatingthree-dimensional (3D) objects from two-dimensional (2D) images ofgarments. The system 100 includes a data module 102 and a processor 104.The processor 104 includes a computer vision model 106, a trainingmodule 108, and a 3D object generator 110. Each of these components isdescribed in detail below.

The data module 102 is configured to receive a 2D image 10 of a selectedgarment, a target 3D model 12, and one or more garment panels 13 for theselected garment. Non-limiting examples of a suitable garment mayinclude top-wear, bottom-wear, and the like. The 2D image 10 may be astandalone image of the selected garment in one embodiment. The term“standalone image” as used herein refers to the image of the selectedgarment by itself and does not include a model or a mannequin. Incertain embodiments, the 2D image 10 may be a flat shot image of theselected garment. The flat shot images may be taken from any suitableangle and include top-views, side views, front-views, back-views, andthe like. In another embodiment, the 2D image 10 may be an image of ahuman model or a mannequin wearing the selected garment taken from anysuitable angle.

In one embodiment, the 2D image 10 of the selected garment maycorrespond to a catalog image selected by a consumer on a fashion retailplatform (e.g., a fashion e-commerce platform). In such embodiments, thesystems and methods described herein provide for virtual fitting of thegarment by the consumer. The data module 102 in such instances may beconfigured to access the fashion retail platform to retrieve the 2Dimage 10.

In another embodiment, the 2D image 10 of the selected garment maycorrespond to a 2D image from a fashion e-catalog that needs to bedigitized in a 3D form. In such embodiments, the 2D image 10 of theselected garment is stored in a 2D image repository (not shown) eitherlocally (e.g., in a memory coupled to the processor 104) or in a remotelocation (e.g., cloud storage, offline image repository and the like).The data module 102 in such instances may be configured to access the 2Dimage repository to retrieve the 2D image 10.

With continued reference to FIG. 1 , the data module 102 is furtherconfigured to receive a target 3D model 12. The term “target 3D model”as used herein refers to a 3D model having one or more characteristicsthat are desired in the generated 3D object. For example, in someembodiments, the target 3D model 12 may include a plurality of 3Dcatalog models in different poses. In such embodiments, the target 3Dmodel may be stored in a target model repository (not shown) eitherlocally (e.g., in a memory coupled to the processor 104) or in a remotelocation (e.g., cloud storage, offline image repository, and the like).The data module 102 in such instances may be configured to access thetarget model repository to retrieve the target 3D model 12.

Alternatively, for embodiments involving consumers virtually trying onthe selected garments, the target 3D model 12 may be a 3D consumer modelgenerated based on one or more inputs (e.g., body dimensions, height,body shape, skin tone and the like) provided by a consumer. In suchembodiments, as described in the detail later, the system 100 mayfurther include a 3D consumer model generator configured to generate atarget 3D model 12 of the consumer, based on the inputs provided.Further, in such embodiments, the data module 102 may be configured toaccess the target 3D model 12 from the 3D consumer model generator.

The data module 110 is further configured to receive information on oneor more garments panels 13 corresponding to the selected garment. Theterm “garment panel” as used herein refers to panels used by fashiondesigners to stitch the garment. The one or more garment panels 13 maybe used to generate a fixed UV map as described herein later.

Referring back to FIG. 1 , the processor 104 is communicatively coupledto the data module 102. The processor 104 includes a computer visionmodel 106 configured to generate a UV map 14 of the 2D image 10 of theselected garment. The term “UV mapping” as used herein refers to the 3Dmodeling process of projecting a 2D image to a 3D model's surface fortexture mapping. The term “UV map” as used herein refers to thebidimensional (2D) nature of the process, wherein the letters “U” and“V” denote the axes of the 2D texture.

The computer vision model 106, as shown in FIG. 2 , further includes alandmark and segmental parsing network 116, a texture mapping network117, and an inpainting network 118. The landmark and segmental parsingnetwork 116 is configured to provide spatial information 22corresponding to the 2D image 10. The texture mapping network 117 isconfigured to warp/map the 2D image 10 onto a fixed UV map, based on thespatial information 22 corresponding to the 2D image, to generate awarped image 24. The inpainting network 118 is configured to add textureto one or more occluded portions in the warped image 24 to generate theUV map 14.

This is further illustrated in FIG. 3 , wherein the 2D image 10 is animage of a model wearing a shirt as the selected garment. Spatialinformation 22 corresponding to the 2D image 10 is provided by thelandmark and segmental parsing network 116, as shown in FIG. 3 . The 2Dimage 10 is mapped/warped on the fixed UV map 15 by the texture mappingnetwork 117, based on the spatial information 22, to generate the warpedimage 24. The fixed UV map 15 corresponds to one or more garment panels13 for the selected garment (e.g., the shirt in the 2D image 10), asmentioned earlier. The fixed UV map 15 may be generated by a fixed UVmap generator (not shown in the Figures). Further, texture is added toone or more occluded portions 23 in the warped image 24 by theinpainting network 118 to generate the UV map 14.

Non-limiting examples of a suitable landmark and segmental parsingnetwork 116 include a deep learning neural network. Non-limitingexamples of a suitable texture mapping network 117 include a computervision model such as a thin plate splice (tps) model. Non-limitingexamples of a suitable inpainting network 118 include a deep learningneural network.

Referring now to FIGS. 4 and 5 , the landmark and segmental parsingnetwork 116 is configured to provide a plurality of inferred controlpoints corresponding to the 2D image 10, and the texture mapping network117 is configured to map the 2D image 10 onto the fixed UV map 15 basedon the plurality of inferred control points and a plurality ofcorresponding fixed control points on the fixed UV map 15.

The spatial information 22 provided by the landmark and segmentalparsing network 116 includes landmark predictions 25 (as shown in FIG. 4) and segment predictions 26 (as shown in FIG. 5 ). The landmarks 25 (asshown by numbers 1-13 in FIG. 4 ) are used as the inferred controlpoints by the texture mapping network 117 to warp (or map) the 2D image10 onto the fixed UV map 15.

The landmark and segmental parsing network 116 is further configured togenerate a segmented garment mask, and the texture mapping network 117is configured to mask the 2D image 10 with the segmented garment maskand map the masked 2D image onto the fixed UV map 15 based on theplurality of inferred control points. This is further illustrated inFIG. 6 wherein the segmented garment mask 27 is generated from the 2Dimage 10 by the landmark and segmental parsing network 116. The inputimage 10 is masked with the segmented image 27 to generate the masked 2Dimage 28 by the texture mapping network 117.

The texture mapping network 117 is further configured to warp/map themasked 2D image 28 on the fixed UV map 15 based on the plurality ofinferred control points 23 to generate the warped image 24. Thus, thetexture mapping network 117 is configured to map only segmented pixelswhich helps in reducing occlusions (caused by hands/other garmentarticles). Further, the texture mapping network 117 allows forinterpolation of texture at high resolution.

As noted earlier, the inpainting network 118 is configured to addtexture to one or more occluded portions in the warped image 26 togenerate the UV map 14. This is further illustrated in FIG. 7 wheretexture is added to occluded portions 23 in the warped image 24 togenerate the UV map 14.

The inpainting network 118 is further configured to infer the texturethat is not available in the 2D image 10. According to embodiments ofthe present description, the texture is inferred by the inpaintingnetwork 118 by training the computer vision model 106 usingsynthetically generated data. The synthetic data for training thecomputer vision model 106 is generated based on a plurality of 2Dtraining images and a plurality of ground truth (GT) panels for aplurality of 3D training models as described below.

Referring again to FIG. 1 , the processor 104 further includes atraining module 108 configured to train the computer vision model 106based on a plurality of 2D training images and a plurality of groundtruth (GT) panels for a plurality of 3D training models. In someembodiments, the system 100 may further include a 3D training modelgenerator 112 and a training data generator 114, as shown in FIG. 1 .

The 3D training model generator 112 is configured to generate theplurality of 3D training models based on a plurality of target modelposes and garment panel data. The 3D training model generator 112 isfurther configured to generate 3D draped garments on various 3D humanbodies at scale. In some embodiments, the 3D training model generator112 includes a 3D creation suite tool configured to create the 3Dtraining models.

As shown in FIG. 8 , the 3D training model generator 112 is firstconfigured to identify a 3D pose 32 of a training model 30, and drapethe garment onto the training model 30 in a specific pose. The 3Dtraining model generator 112 is further configured to drape the garmentonto the 3D training model 30 by using the information available inclothing panels 34 used by the fashion designers while stitching thegarment, as shown in FIG. 9 .

Referring again to FIG. 1 , the training data generator 114 iscommunicatively coupled with the 3D training model generator 112, andconfigured to generate the plurality of GT panels and the plurality of2D training images, based on UV maps. This is further illustrated inFIG. 10 . As shown in FIG. 10 , a 3D training model 30 is placed in alighted scene 36 along with a camera to generate a training UV map 38and a 2D training image 40.

The training data generator 114 is configured to use the training UV map38 to encode the garment texture associated with the 3D training model30 and for creating a corresponding GT panel. The training datagenerator 114 is configured to generate a plurality of GT panels and aplurality of 2D training images by varying one or more of model poses,lighting conditions, garment textures, garment colours, or camera anglesfor a plurality of 3D training models.

Thus, according to embodiments of the present description, the computervision model 106 is trained using synthetic data generated by thetraining data generator 114. Therefore, the trained computer visionmodel 106 is configured to generate a UV map that is a learned UV map,i.e., the UV map is generated based on the training imparted to thecomputer vision model 106.

With continued reference to FIG. 1 , the processor further includes a 3Dobject generator configured to generate a 3D object corresponding to theselected garment based on the UV map generated by the trained computervision model and the target 3D model. This is further illustrated inFIG. 11 , where a plurality of 3D objects 20 is generated based on a UVmap 14 generated from the 2D image 10. As shown in FIG. 11 , theplurality of 3D objects 20 corresponds to a 3D model wearing theselected garment in different poses. In some embodiments, the pluralityof 3D objects may correspond to a 3D e-catalog model wearing theselected garment in different poses. In some other embodiments, theplurality of 3D objects may correspond to a 3D consumer model wearingthe selected garment in different poses.

The manner of implementation of the system 100 of FIG. 1 is describedbelow in FIGS. 12-15 .

FIG. 12 is a flowchart illustrating a method 200 for generatingthree-dimensional (3D) objects from two-dimensional (2D) images ofgarments. The method 200 may be implemented using the systems of FIG. 1, according to some aspects of the present description. Each step of themethod 200 is described in detail below.

The method 200 includes, at step 202, receiving a 2D image of a selectedgarment and a target 3D model. The 2D image may be a standalone image ofthe selected garment in one embodiment. The term “standalone image” asused herein refers to the image of the selected garment by itself anddoes not include a model or a mannequin. In another embodiment, the 2Dimage may be an image of a model or a mannequin wearing the selectedgarment taken from any suitable angle.

In one embodiment, the 2D image of the selected garment may correspondto a catalog image selected by a consumer on a fashion retail platform(e.g., a fashion e-commerce platform). In another embodiment, the 2Dimage of the selected garment may correspond to a 2D image from afashion e-catalog that needs to be digitized in a 3D form.

The term “target 3D model” as used herein refers to a 3D model havingone or more characteristics that are desired in the generated 3D object.For example, in some embodiments, the target 3D model may include aplurality of 3D catalog models in different poses. Alternatively, forembodiments involving consumers virtually trying on the selectedgarments, the target 3D model may be a 3D consumer model generated basedon one or more inputs provided by a consumer. In such embodiments, themethod 300 may further include generating a target 3D model of theconsumer, based on the inputs provided.

Referring again to FIG. 12 , the method 200 includes, at step 204,training a computer vision model based on a plurality of 2D trainingimages and a plurality of ground truth panels for a plurality of 3Dtraining models.

In some embodiments, the method 200 further includes, at step 201,generating a plurality of 3D training models based on a plurality oftarget model poses and garment panel data, as shown in FIG. 13 . Themethod 200 furthermore includes, at step 203, generating the pluralityof ground truth (GT) panels and the plurality of 2D training images,based on UV maps, by varying one or more of model poses, lightingconditions, garment textures, garment colours, or camera angles for theplurality of 3D training models as shown in FIG. 13 . The implementationof steps 201 and 203 has been described herein earlier with reference toFIG. 10 .

Referring again to FIG. 12 , the method 200 includes, at step 206,generating a UV map of the 2D image of the selected garment based on thetrained computer vision model. As noted earlier, the computer visionmodel includes a landmark and segmental parsing network, a texturemapping network, and an inpainting network. Non-limiting examples of asuitable landmark and segmental parsing network include a deep learningneural network. Non-limiting examples of a suitable texture mappingnetwork include a computer vision model such as a thin plate splice(tps) model. Non-limiting examples of a suitable inpainting networkinclude a deep learning neural network.

The implementation of step 206 of method 200 is further described inFIG. 14 . The step 206 further includes, at block 210, providing spatialinformation corresponding to the 2D image. The step 206 furtherincludes, at block 212, warping/mapping the 2D image onto a fixed UVmap, based on the spatial information corresponding to the 2D image, togenerate a warped image. The step 206 further includes, at block 214,adding texture to one or more occluded portions in the warped image togenerate the UV map. The fixed UV map corresponds to one or more garmentpanels for the selected garment), as mentioned earlier. The step 206 mayfurther include generating the fixed UV map based on the one or moregarment panels (not shown in figures).

The spatial information provided by the landmark and segmental parsingnetwork includes landmark predictions (as described earlier withreference to FIG. 4 ) and segment predictions (as described earlier withreference to). The landmarks (as shown by numbers 1-13 in FIG. 4 ) areused as the inferred control points by the texture mapping network towarp (or map) the 2D image onto the fixed UV map.

Referring now to FIG. 15 , the step 206 of generating the UV mapincludes, at block 216, providing a plurality of inferred control pointscorresponding to the 2D image. At block 218, the step 206 includesgenerating a segmented garment mask based on the 2D image. The step 206,further includes, at block 220, masking the 2D image with the segmentedgarment mask. At block 222, the step 206 includes warping/mapping themasked 2D image on the fixed UV map based on the plurality of inferredcontrol points and a plurality of fixed control points on the fixed UVmap to generate the warped image.

The step 206 further includes, at block 224, adding texture to one ormore occluded portions in the warped image to generate the UV map.According to embodiments of the present description, the texture isinferred and added to the occluded portions by training the computervision model using synthetically generated data as mentioned earlier.The manner of implementation of step 206 is described herein earlierwith reference to FIGS. 3-7 .

Referring again to FIG. 12 , the method 200 includes, at step 208,generating a 3D object corresponding to the selected garment based onthe UV map generated by a trained computer vision model and the target3D model. In some embodiments, the plurality of 3D objects maycorrespond to a 3D e-catalog model wearing the selected garment indifferent poses. In some other embodiments, the plurality of 3D objectsmay correspond to a 3D consumer model wearing the selected garment indifferent poses.

In some embodiments, a system to virtually fit garments on consumers bygenerating three-dimensional (3D) objects from two-dimensional (2D)images of garments is presented.

FIG. 16 illustrates an example system 300 for virtually fitting garmentson consumers by generating three-dimensional (3D) objects fromtwo-dimensional (2D) images of garments. The system 300 includes a datamodule 102, a processor 104, and a 3D consumer mode generator 120. Theprocessor 104 includes a computer vision model 106, a training module108, and a 3D object generator 110.

The 3D consumer model generator 120 is configured to generate a 3Dconsumer model based on one or more inputs provided by a consumer. Thedata module 102 is configured to receive a 2D image of a selectedgarment and the 3D consumer model from the 3D consumer model generator.The computer vision model 106 is configured to generate a 2D map of the2D image of the selected garment;

The training module 108 is configured to train the computer vision modelbased on a plurality of 2D training images and a plurality of groundtruth (GT) panels for a plurality of 3D training models. The 3D objectgenerator 110 is configured to generate a 3D object corresponding to theselected garment based on the UV map generated by a trained computervision model and the 3D consumer model, wherein the 3D object is the 3Dconsumer model wearing the selected garment. Each of these components isdescribed earlier with reference to FIG. 1 .

The system 300 may further include a user interface 122 for the consumerto provide inputs as well as select a garment for virtual fitting, asshown in FIG. 16 . FIG. 16 illustrates an example user interface 122where the consumer may provide one or more inputs such as bodydimensions, height, body shape, and skin tone using the input selectionpanel 124. As shown in FIG. 16 the consumer may further select one ormore garments and correspond sizes for virtual fitting using the garmentselection panel 126. The 3D visual interface 128 further allows theconsumer to visualize the 3D consumer model 20 wearing the selectedgarment, as shown in FIG. 16 . The 3D visual interface 128 in suchembodiments may be communicatively coupled with the 3D object creator110.

Embodiments of the present description provide for systems and methodsfor generating 3D objects from 2D images using a computer vision modeltrained using synthetically generated data. The synthetic training datais generated by first draping garments on various 3D human bodies atscale by using the information available in clothing panels used by thefashion designers while stitching the garments. The resulting 3Dtraining models are employed to generate a plurality of ground truthpanels and a plurality of 2D training images by encoding the garmenttexture in training UV maps generated from the 3D training models. Thus,generating synthetic data capable of training the computer vision modelto generate high-resolution 3D objects with corresponding clothingtexture.

The systems and methods described herein may be partially or fullyimplemented by a special purpose computer system created by configuringa general-purpose computer to execute one or more particular functionsembodied in computer programs. The functional blocks and flowchartelements described above serve as software specifications, which may betranslated into the computer programs by the routine work of a skilledtechnician or programmer.

The computer programs include processor-executable instructions that arestored on at least one non-transitory computer-readable medium, suchthat when run on a computing device, cause the computing device toperform any one of the aforementioned methods. The medium also includes,alone or in combination with the program instructions, data files, datastructures, and the like. Non-limiting examples of the non-transitorycomputer-readable medium include, but are not limited to, rewriteablenon-volatile memory devices (including, for example, flash memorydevices, erasable programmable read-only memory devices, or a maskread-only memory devices), volatile memory devices (including, forexample, static random access memory devices or a dynamic random accessmemory devices), magnetic storage media (including, for example, ananalog or digital magnetic tape or a hard disk drive), and opticalstorage media (including, for example, a CD, a DVD, or a Blu-ray Disc).Examples of the media with a built-in rewriteable non-volatile memory,include but are not limited to memory cards, and media with a built-inROM, including but not limited to ROM cassettes, etc. Programinstructions include both machine codes, such as produced by a compiler,and higher-level codes that may be executed by the computer using aninterpreter. The described hardware devices may be configured to executeone or more software modules to perform the operations of theabove-described example embodiments of the description, or vice versa.

Non-limiting examples of computing devices include a processor, acontroller, an arithmetic logic unit (ALU), a digital signal processor,a microcomputer, a field programmable array (FPA), a programmable logicunit (PLU), a microprocessor or any device which may executeinstructions and respond. A central processing unit may implement anoperating system (OS) or one or more software applications running onthe OS. Further, the processing unit may access, store, manipulate,process and generate data in response to the execution of software. Itwill be understood by those skilled in the art that although a singleprocessing unit may be illustrated for convenience of understanding, theprocessing unit may include a plurality of processing elements and/or aplurality of types of processing elements. For example, the centralprocessing unit may include a plurality of processors or one processorand one controller. Also, the processing unit may have a differentprocessing configuration, such as a parallel processor.

The computer programs may also include or rely on stored data. Thecomputer programs may encompass a basic input/output system (BIOS) thatinteracts with hardware of the special purpose computer, device driversthat interact with particular devices of the special purpose computer,one or more operating systems, user applications, background services,background applications, etc.

The computer programs may include: (i) descriptive text to be parsed,such as HTML (hypertext markup language) or XML (extensible markuplanguage), (ii) assembly code, (iii) object code generated from sourcecode by a compiler, (iv) source code for execution by an interpreter,(v) source code for compilation and execution by a just-in-timecompiler, etc. As examples only, source code may be written using syntaxfrom languages including C, C++, C#, Objective-C, Haskell, Go, SQL, R,Lisp, Java®, Fortran, Perl, Pascal, Curl, OCaml, Javascript®, HTML5,Ada, ASP (active server pages), PHP, Scala, Eiffel, Smalltalk, Erlang,Ruby, Flash®, Visual Basic®, Lua, and Python®.

One example of a computing system 400 is described below in FIG. 17 .The computing system 400 includes one or more processor 402, one or morecomputer-readable RAMs 404 and one or more computer-readable ROMs 406 onone or more buses 408. Further, the computer system 408 includes atangible storage device 410 that may be used to execute operatingsystems 420 and 3D object generation system 100. Both, the operatingsystem 420 and the 3D object generation system 100 are executed byprocessor 402 via one or more respective RAMs 404 (which typicallyincludes cache memory). The execution of the operating system 420 and/or3D object generation system 100 by the processor 402, configures theprocessor 402 as a special-purpose processor configured to carry out thefunctionalities of the operation system 420 and/or the 3D objectgeneration system 100, as described above.

Examples of storage devices 410 include semiconductor storage devicessuch as ROM 504, EPROM, flash memory or any other computer-readabletangible storage device that may store a computer program and digitalinformation.

Computing system 400 also includes a R/W drive or interface 412 to readfrom and write to one or more portable computer-readable tangiblestorage devices 4246 such as a CD-ROM, DVD, memory stick orsemiconductor storage device. Further, network adapters or interfaces414 such as a TCP/IP adapter cards, wireless Wi-Fi interface cards, or3G or 4G wireless interface cards or other wired or wirelesscommunication links are also included in the computing system 400.

In one example embodiment, the 3D object generation system 100 may bestored in tangible storage device 410 and may be downloaded from anexternal computer via a network (for example, the Internet, a local areanetwork or another wide area network) and network adapter or interface414.

Computing system 400 further includes device drivers 416 to interfacewith input and output devices. The input and output devices may includea computer display monitor 418, a keyboard 422, a keypad, a touchscreen, a computer mouse 424, and/or some other suitable input device.

In this description, including the definitions mentioned earlier, theterm ‘module’ may be replaced with the term ‘circuit.’ The term ‘module’may refer to, be part of, or include processor hardware (shared,dedicated, or group) that executes code and memory hardware (shared,dedicated, or group) that stores code executed by the processorhardware. The term code, as used above, may include software, firmware,and/or microcode, and may refer to programs, routines, functions,classes, data structures, and/or objects.

Shared processor hardware encompasses a single microprocessor thatexecutes some or all code from multiple modules. Group processorhardware encompasses a microprocessor that, in combination withadditional microprocessors, executes some or all code from one or moremodules. References to multiple microprocessors encompass multiplemicroprocessors on discrete dies, multiple microprocessors on a singledie, multiple cores of a single microprocessor, multiple threads of asingle microprocessor, or a combination of the above. Shared memoryhardware encompasses a single memory device that stores some or all codefrom multiple modules. Group memory hardware encompasses a memory devicethat, in combination with other memory devices, stores some or all codefrom one or more modules.

In some embodiments, the module may include one or more interfacecircuits. In some examples, the interface circuits may include wired orwireless interfaces that are connected to a local area network (LAN),the Internet, a wide area network (WAN), or combinations thereof. Thefunctionality of any given module of the present description may bedistributed among multiple modules that are connected via interfacecircuits. For example, multiple modules may allow load balancing. In afurther example, a server (also known as remote, or cloud) module mayaccomplish some functionality on behalf of a client module.

While only certain features of several embodiments have been illustratedand described herein, many modifications and changes will occur to thoseskilled in the art. It is, therefore, to be understood that the appendedclaims are intended to cover all such modifications and changes as fallwithin the scope of the invention and the appended claims.

1. A system for generating three-dimensional (3D) objects fromtwo-dimensional (2D) images of garments, the system comprising: a datamodule configured to receive a 2D image of a selected garment and atarget 3D model; a computer vision model configured to generate a UV mapof the 2D image of the selected garment; a training module configured totrain the computer vision model based on a plurality of 2D trainingimages and a plurality of ground truth (GT) panels for a plurality of 3Dtraining models; and a 3D object generator configured to generate a 3Dobject corresponding to the selected garment based on the UV mapgenerated by a trained computer vision model and the target 3D model. 2.The system of claim 1, wherein the computer vision model comprises: alandmark and segmental parsing network configured to provide spatialinformation corresponding to the 2D image; a texture mapping networkconfigured to map the 2D image onto a fixed UV map based on the spatialinformation corresponding to the 2D image to generate a warped image;and an inpainting network configured to add texture to one or moreoccluded portions in the warped image to generate the UV map.
 3. Thesystem of claim 2, wherein the landmark and segmental parsing network isconfigured to provide a plurality of inferred control pointscorresponding to the 2D image, and the texture mapping network isconfigured to map the 2D image onto the fixed UV map based on theplurality of inferred control points and a plurality of correspondingfixed control points on the fixed UV map.
 4. The system of claim 3,wherein the landmark and segmental parsing network is further configuredto generate a segmented garment mask, and the texture mapping network isconfigured to mask the 2D image with the segmented garment mask and mapthe masked 2D image onto the fixed UV map based on the plurality ofinferred control points.
 5. The system of claim 1, further comprising atraining data generator configured to generate the plurality of GTpanels and the plurality of 2D training images, based on UV maps, byvarying one or more of model poses, lighting conditions, garmenttextures, garment colours, or camera angles for the plurality of 3Dtraining models.
 6. The system of claim 5, further comprising a 3Dtraining model generator configured to generate the plurality of 3Dtraining models based on a plurality of target model poses and garmentpanel data.
 7. The system of claim 1, wherein the target 3D modelcomprises a plurality of 3D catalog models in different poses.
 8. Thesystem of claim 1, wherein the target 3D model is a 3D consumer modelgenerated based on one or more of body dimensions, height, body shape,and skin tone provided by a consumer.
 9. A system configured tovirtually fit garments on consumers by generating three-dimensional (3D)objects from two-dimensional (2D) images of garments, the systemcomprising: a 3D consumer model generator configured to generate a 3Dconsumer model based on one or more information provided by a consumer;a data module configured to receive a 2D image of a selected garment andthe 3D consumer model; a computer vision model configured to generate a2D map of the 2D image of the selected garment; a training moduleconfigured to train the computer vision model based on a plurality of 2Dtraining images and a plurality of ground truth (GT) panels for aplurality of 3D training models; and a 3D object generator configured togenerate a 3D object corresponding to the selected garment based on theUV map generated by a trained computer vision model and the 3D consumermodel, wherein the 3D object is the 3D consumer model wearing theselected garment.
 10. The system of claim 9, wherein the computer visionmodel comprises: a landmark and segmental parsing network configured toprovide spatial information corresponding to the 2D image; a texturemapping network configured to map the 2D image onto a fixed UV map basedon the spatial information corresponding to the 2D image to generate awarped image; and an inpainting network configured to add texture to oneor more occluded portions in the warped image to generate the UV map.11. The system of claim 10, wherein the landmark and segmental parsingnetwork is configured to provide a plurality of inferred control pointscorresponding to the 2D image, and the texture mapping network isconfigured to map the 2D image onto the fixed UV map based on theplurality of inferred control points and a plurality of correspondingfixed control points on the fixed UV map.
 12. The system of claim 11,wherein the landmark and segmental parsing network is further configuredto generate a segmented garment mask, and the texture mapping network isconfigured to mask the 2D image with the segmented garment mask and mapthe masked 2D image onto the fixed UV map based on the plurality ofinferred control points.
 13. The system of claim 8, further comprising atraining data generator configured to generate the plurality of groundtruth (GT) panels and 2D training images, based on UV maps, by varyingone or more of model poses, lighting conditions, garment textures,garment colours, or camera angles for the plurality of 3D trainingmodels.
 14. A method for generating three-dimensional (3D) objects fromtwo-dimensional (2D) images of garments, the method comprising:receiving a 2D image of a selected garment and a target 3D model;training a computer vision model based on a plurality of 2D trainingimages and a plurality of ground truth panels for a plurality of 3Dtraining models; generating a UV map of the 2D image of the selectedgarment based on the trained computer vision model; and generating a 3Dobject corresponding to the selected garment based on the UV mapgenerated by a trained computer vision model and the target 3D model.15. The method of claim 14, wherein the computer vision model comprises:a landmark and segmental parsing network configured to provide spatialinformation corresponding the 2D image; a texture mapping networkconfigured to map the 2D image onto a fixed UV map based on the spatialinformation corresponding to the 2D image to generate a warped image;and an inpainting network configured to add texture to one or moreoccluded portions in the warped image to generate the UV map.
 16. Themethod of claim 15, wherein the landmark and segmental parsing networkis configured to provide a plurality of inferred control pointscorresponding to the 2D image, and the texture mapping network isconfigured to map the 2D image onto the fixed UV map based on theplurality of inferred control points and a plurality of correspondingfixed control points on the fixed UV map.
 17. The method of claim 16,wherein the landmark and segmental parsing network is further configuredto generate a segmented garment mask, and the texture mapping network isconfigured to mask the 2D image with the segmented garment mask and mapthe masked 2D image onto the fixed UV map based on the plurality ofinferred control points.
 18. The method of claim 14, further comprisinggenerating the plurality of ground truth (GT) panels and the pluralityof 2D training images, based on UV maps, by varying one or more of modelposes, lighting conditions, garment textures, garment colours, or cameraangles for the plurality of 3D training models.
 19. The method of claim14, wherein the target 3D model comprises a plurality of 3D catalogmodels in different poses.
 20. The method of claim 14, wherein thetarget 3D model is a 3D consumer model generated based on one or more ofbody dimensions, height, body shape, and skin tone provided by aconsumer.